SPARK tasking v2
In , I designed a decentralised tasking system requiring a membership service providing a list of all active checker nodes. Will suggested a simpler solution that does not require a list of active members. (Slack discussion.)
Assumptions
- Checker nodes are (somewhat) uniformly distributed across all IPv4 /24 subnets.
- Whenever we create a committee of IPv4 /24 subnets, there will be at least X% of checker nodes in those networks online; we can estimate the value of X, and it does not change too much.
Ideas
- Use consistent hashing to split the space of all IPv4 /24 subnets into buckets - committees.
- Extend the length of the SPARK round and let each committee perform multiple retrieval checks per round.
- Assign rewards to checker nodes retroactively after nodes report retrieval results. After the round is over, we know exactly which checker nodes participated in the round - that gives us the membership information.
- Within each subnet and for each job, assign the reward to the node inside that subnet that is “closest” to the job, using the XOR distance between the node’s public key and the hash of the job definition. (See below for details.)
Benefits
- We can have a network of tasker nodes that don’t share any state.
- Tasks are computed deterministically. The fraud detection algorithm can re-run the tasking algorithm to verify which tasks are legitimate.
The algorithm executed by the tasking node
Parameters
- CC - “committee count” - how many committees we want to form. Initially, this will be a hardcoded parameter. We will manually tweak it over time based on the number of active checker nodes.
- TC - “tasks per committee” - how many tasks to define for each committee.
- TN - “tasks per node” - how many tasks a node performs. This value should be a fraction of TC, e.g.
TN = TC/5
We need to find a good balance between TC & TN.
- We want TC to be high to maximise the probability that every checker node receives at least one reward.
- We want to keep TN relatively low to limit the bandwidth consumed by checker nodes and by SPs serving our retrievals.
- However, we may end up with not enough redundant measurements with high TC and low TN, which means we won’t have confidence that the majority is honest.
All three parameters will evolve over time, so we must figure out what telemetry data we must collect to allow us to reason about the suitability of current CC/TC/TN values.
Inputs
- SPARK round
- The IPv4 address of the checker node that’s asking for tasks
Step1: DRAND randomness
Inputs
- SPARK round number
Steps
- Map the SPARK round to the DRAND round, using genesis time & round length of SPARK & DRAND.
- Obtain the 256 bits of randomness via DRAND public API.
Outputs
- DRAND randomness (256 bits, the SHA-256 hash of the signature)
Step2: Committee Selection
- Calculate subnet’s position on the hash ring:
const subnet = // first three bytes of node's IPv4 address const data = concat(randomness, subnet) const ring_index = await crypto.subtle.digest("SHA-256", data)
- Calculate the bucket size for each committee:
ALL_SUBNETS := 2^24; bucket_size := ALL_SUBNETS / CC;
- Compute the committee the node belongs to:
committee := ring_index / bucket_size; // using big number arithmeticsAssuming the number of committees is a power of two, i.e.
CC = 2^NandCC < 256committee := most_significant_byte_of(ring_index) / (256 / CC)
Outputs
- Index of the committee this node belongs to.
Step3: Committee Tasking
We want to define TC different tasks (retrieval checks) for each committee. The particular implementation will depend on how we implement . The implementation details are not important here.
Outputs:
- A list of
TCtasks (pairs ofCID, SP) for the given committee and round.
The algorithm executed by the checker node
- Determine what’s the current SPARK round.
TBD: How can a checker node reliably determine what’s the current SPARK round?
- If it’s time-based, then we need a reliable source of current time - the internal clock on desktop computers may be wrong.
- Can the tasker nodes provide API for this?
- Ask the tasker for the DRAND randomness,
committee indexand the list of tasks for the SPARK round . (We obtained the round number in step 1.)
- Pick the first
TNclosest tasks from the list, using the following algorithm to calculate the distance to(cid, sp):distance := XOR( digest("SHA-512", concat(randomness, cid, sp)), checker_public_key, // ed25519 public keys are SHA-512 hashes )
- For each task in the list, and while we are still within the time window of the current SPARK round, perform the retrieval and report the results.
Q: Do we want to enforce serial execution of retrieval checks, or is it fine if a checker fires all retrieval requests immediately? Serial execution should spread the load on SPs over a longer time period.
Outputs (measurement record) - per each task
- SPARK round
- Checker’s identity (public key)
- Checker’s IPv4 address we can trust
- Initially, this will be recorded by the measurement service we are running.
- We need to find a trustless solution in the future. For example, tasker services can use their identity to sign
(randomness, ipv4). There may be better options, like leveraging TCP or QUIC handshake messages - that is a research task for later.
- CID
- SP
- other fields as described in
Verification of measurements by the evaluation service
After the SPARK round is over and all measurements are recorded, we can gather membership information from the reported data and decide which measurements will be further used for verifications of proofs and eligible for rewards.
- Determine the SPARK round we are evaluating.
- Build a list of all IPv4 subnets that participated in this round.
- For each IPv4 subnet, compute the committee index it belongs to. (See )
- For each committee index seen, obtain the list of
TCtasks assigned to this committee- This depends on how we implement CID sampling.
- For the initial version, the tasking service can provide an API to retrieve the list of all committees and their tasks for the given SPARK round.
get_tasks(round) -> Map<CommiteeIndex, Array<Task>>
- Build a lookup table allowing us to quickly check whether
(subnet, cid, sp)is a valid task.
- For each measurement, check if
(subnet, cid, sp)is valid. Filter out records that don’t pass.
- Group all measurements by their
(subnet, cid, sp)triple.
- For each group
(subnet, cid, sp)- For each
(cid, sp)measured, find the closest measurement using the same algorithm as described indistance := XOR( digest("SHA-512", concat(randomness, cid, sp)), checker_public_key, // ed25519 public keys are SHA-512 hashes )
- Flag the closest measurement as valid; filter out all other measurements for the same task.
- For each
Output
List of measurements with the following properties:
- Each measurement is for a valid task
- For each task, there is at most one measurement per IPv4 subnet.
Computational Complexity
TL;DR:
- Worst case:
O(M * log(M)), whereMis the number of measurements submitted for the given epoch
- Optimistic:
O(N * log(N)), whereNis the number of checker nodes participating in the given epoch
Details
- We need to iterate through all measurements multiple times -
O(M)
- We compute the committee index for each IPv4 subnet. Assuming the number of subnets actively participating in the round is
S, the complexity isO(S). However,Sis limited to 2^24 (~16mil).
- In step 5, we want to build a set of all valid
(subnet, cid, sp)triples. There will be at mostTCvalid tasks for each subnet, which gives useTC*Striples. Assuming a data structure with logarithmic complexity for inserts and lookups, step 5 has the complexityO(TC*S*log(TC*S)).
- In step 6, we are filtering out invalid measurements using the lookup table from step 5. This has the complexity
O(M*log(TC*S)). WhileTC*Sis capped,Mis not.
- In step 7, we want to group valid measurements into groups
(subnet, cid, sp). A naive algorithm that sorts the measurements by comparing(subnet, cid, sp)to create an array where groups are laid out one next to the other requiresO(M*log(M)).
- In step 8, we want to find the closest measurement in each group.
- In each group of measurements, we can find the closest measurement in linear time
- Therefore, the entire step 8 requires linear time, too →
O(M)
Where:
Mis the number of measurements submitted for the given epoch
Sis the number of IPv4 subnets participating in the given epoch
TCis our parameter (number of tasks per committee)
Nis the number of checker nodes participating in the given epoch
Invariants:
- M ≥ N ≥ S
Worst-case
Combined, we are getting the following worst-case complexity:O(M + S + TC*S*log(TC*S) + M*log(TC*S) + M*log(M) + M))
Which can be simplified to
O(M*(log(TC*S)+log(M)) + S + TC*S*log(TC*S)) =
O(M*log(TC*S) + M*log(M) + S + TC*S*log(TC*S)) =
O(log(TC*S)*(M + TC*S) + M*log(M))Treating TC as a constant:
O(log(S)*(M+S) + M*log(M)) =
O(M*log(S) + M*log(M) + S*log(S) = # M >= S
O(M*log(M))Optimistic
In the optimistic case:
- Each node submits
TNmeasurements, thereforeM = N * TN.
- Nodes are equally distributed between subnets, therefore
S=O(N).
The time complexity is
O(N * TC * (log(TC*N) + log(N * TN)) + N + TC*N*log(TC*N)) =
O(N * TC * (log(TC) + log(N) + log(N) + log (TN)) + N + TC*N*log(TC*N)) =
O(N * TC * (log(TC) + log(TN) + log(N)) + N + N * TC * (log(TC) + log(N)) =
O(N * TC * (log(TN) + log (TC) + log(N)) + N)Assuming N >> TN and N >> TC and treating both TN and TC as constants:
O(N * log(N))Space (RAM) Complexity
I am excluding the space to store the raw measurements.
TL;DR:
O(S + M), whereSis the number of participating IPv4 subnets andMis the number of measurements.
- Keeping all measurements in memory will quickly become infeasible. We need to research an algorithm for finding the closest measurement in each group that will trade time for space, preferably leveraging features offered by the database storing the measurements.
Details
- In step 2 & 3, we build a list of all participating IPv4 subnets - that’s
O(S)
- In step 4, we obtain a list of tasks for each committee - that’s
O(CC*TC)(a constant)
- In step 5, we build a set of all valid
(subnet, cid, sp)triples. There will be at mostS*TCtriples to store -O(S*TC)
- In step 7, we group measurements and then run some more steps for each group. We can store all groups in a list of all measurements -
O(M)
Where:
Mis the number of measurements submitted for the given epoch
Sis the number of IPv4 subnets participating in the given epoch
TCis our parameter (number of tasks per committee)
Nis the number of checker nodes participating in the given epoch
Invariants:
- M ≥ N ≥ S
All together:
O(S + CC*TC + S*TC + M) = # next, ignore constants
O(S + M)